Part 3: Mechanics
Inspect & Import data
R tries to import the first sheet of the excel file which resolves in an error. This is why the argument read_excel function has to be used to specify the column.
# Inspect sheets of excel-file
excel_sheets('C:/Users/LK/Nextcloud7/Personal/Docs/case-studies/Air France/assets/Air France Case Spreadsheet Supplement.xls')
## [1] "DoubleClick" "Copyright" "Kayak"
# Import data
kayak <- read_excel("C:/Users/LK/Nextcloud7/Personal/Docs/case-studies/Air France/assets/Air France Case Spreadsheet Supplement.xls",
sheet = "Kayak")
doubleclick <- read_excel("C:/Users/LK/Nextcloud7/Personal/Docs/case-studies/Air France/assets/Air France Case Spreadsheet Supplement.xls",
sheet = "DoubleClick")
Massaging
#Convert to dataframe
doubleclick <- as.data.frame(doubleclick)
# Get a big picture understanding of the data
summary(doubleclick)
## Publisher ID Publisher Name Keyword ID Keyword
## Length:4510 Length:4510 Length:4510 Length:4510
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Match Type Campaign Keyword Group Category
## Length:4510 Length:4510 Length:4510 Length:4510
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Bid Strategy Keyword Type Status Search Engine Bid
## Length:4510 Length:4510 Length:4510 Min. : 0.000
## Class :character Class :character Class :character 1st Qu.: 3.384
## Mode :character Mode :character Mode :character Median : 6.250
## Mean : 5.435
## 3rd Qu.: 6.250
## Max. :27.500
## Clicks Click Charges Avg. Cost per Click Impressions
## Min. : 0.0 Min. : 0.00 Min. : 0.000 Min. : 0
## 1st Qu.: 1.0 1st Qu.: 2.31 1st Qu.: 0.825 1st Qu.: 28
## Median : 4.0 Median : 6.76 Median : 1.650 Median : 176
## Mean : 113.7 Mean : 167.48 Mean : 1.890 Mean : 9284
## 3rd Qu.: 19.0 3rd Qu.: 28.49 3rd Qu.: 2.663 3rd Qu.: 844
## Max. :34012.0 Max. :46188.44 Max. :10.000 Max. :8342415
## Engine Click Thru % Avg. Pos. Trans. Conv. % Total Cost/ Trans.
## Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 1.532 1st Qu.: 1.143 1st Qu.: 0.0000 1st Qu.: 0.00
## Median : 4.106 Median : 1.594 Median : 0.0000 Median : 0.00
## Mean : 11.141 Mean : 1.930 Mean : 0.5693 Mean : 27.61
## 3rd Qu.: 10.917 3rd Qu.: 2.308 3rd Qu.: 0.0000 3rd Qu.: 0.00
## Max. :200.000 Max. :15.000 Max. :900.0000 Max. :9597.17
## Amount Total Cost Total Volume of Bookings
## Min. : 0 Min. : 0.00 Min. : 0.0000
## 1st Qu.: 0 1st Qu.: 2.31 1st Qu.: 0.0000
## Median : 0 Median : 6.76 Median : 0.0000
## Mean : 1034 Mean : 167.48 Mean : 0.8734
## 3rd Qu.: 0 3rd Qu.: 28.49 3rd Qu.: 0.0000
## Max. :567463 Max. :46188.44 Max. :439.0000
#Look for weird stuff
table(doubleclick$`Match Type`)
##
## Advanced Broad Exact N/A Standard
## 969 2591 22 48 880
# The NAs have to be removed.
doubleclick_clean <- na.omit(doubleclick)
# Notice how the number of rows gets reduced
print(nrow(doubleclick_clean))
## [1] 3286
# Check the frequency counts of the campaigns that should be analyzed
table(doubleclick_clean$Campaign)
##
## Air France Brand & French Destinations Air France Branded
## 99 18
## Air France Global Campaign Business Class
## 264 15
## French Destinations General Terms
## 83 1
## Geo Targeted Boston Geo Targeted Chicago
## 115 45
## Geo Targeted DC Geo Targeted Detroit
## 118 83
## Geo Targeted Houston Geo Targeted Los Angeles
## 88 104
## Geo Targeted Miami Geo Targeted New York
## 33 233
## Geo Targeted Philadelphia Geo Targeted San Francisco
## 69 83
## Geo Targeted Seattle Google_Yearlong 2006
## 33 480
## Outside Western Europe Paris & France Terms
## 14 102
## Unassigned Western Europe Destinations
## 919 287
# Filter out the Unassigned campaigns
doubleclick_clean <- doubleclick_clean[-grep("Unassigned", doubleclick_clean$Campaign),]
ggplot(data=doubleclick_clean, aes(x=doubleclick_clean$`Clicks`, y=doubleclick_clean$`Impressions`, color=doubleclick_clean$`Match Type`)) + geom_point()

Campaigns
# Find out aggregated Costs per campaign
costs_campaigns <- aggregate(doubleclick_clean$`Total Cost`, by=list(doubleclick_clean$Campaign), FUN=sum)
# Find out aggregated revenue per campaign
revenue_campaigns <- aggregate(doubleclick_clean$Amount, by=list(doubleclick_clean$Campaign), FUN=sum)
roi_campaign <- (revenue_campaigns$x/costs_campaigns$x)
# Which publisher gets used the most
plot_ly(x = doubleclick_clean$'Publisher Name', type = "histogram")
# Find out aggregated Costs per publisher
costs_publisher <- aggregate(doubleclick_clean$`Total Cost`, by=list(doubleclick_clean$`Publisher Name`), FUN=sum)
# Find out aggregated Costs per matchtype
costs_matchtype <- aggregate(doubleclick_clean$`Total Cost`, by=list(doubleclick_clean$`Match Type`), FUN=sum)
# Calculate costs per booking
doubleclick_clean$'Costs per Booking' <- doubleclick_clean$`Total Cost`/doubleclick_clean$`Total Volume of Bookings`
Visibility
# Visualize impressions per publisher
plot_ly(doubleclick_clean, x = doubleclick_clean$`Publisher Name`, y=~Impressions)
## No trace type specified:
## Based on info supplied, a 'bar' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#bar
# Visualize impressions per campaign
plot_ly(doubleclick_clean, x = doubleclick_clean$`Campaign`, y=~Impressions, type='bar')
Volume of Bookings
bookings_campaigns <- aggregate(doubleclick_clean$`Total Volume of Bookings`, by=list(doubleclick_clean$Campaign), FUN=sum)
plot_ly(bookings_campaigns, x=~Group.1, y=~x)
## No trace type specified:
## Based on info supplied, a 'bar' trace seems appropriate.
## Read more about this trace type -> https://plotly.com/r/reference/#bar
Most of the impressions got generated by unassigned keywords!
##Descriptive
Plots Statistics Correlation Association
Predictive
Feature Selection Apply ML-Algorithmus
Message
Key Findings
The C-suite of ___ face the following (problem/challenge), which is best solved with _ (solution) having an impact and/or making profits via ___ . The unique advantages/differentiators of the MVP are ____ , when comparing with the following key competitors / alternatives: ___
Next steps(What needs to be done!)
- Do branded keywords bring in more revenue?
- Are broad or focused keywords more profitable?
- Can assist keywords help increase conversion rate?